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Abstract 

A fundamental property of cell populations is their growth rate as well as the time needed for cell division and its variance. 
The eukaryotic cell cycle progresses in an ordered sequence through the phases Gi, S, G?, and M, and is regulated by 
environmental cues and by intracellular checkpoints. Reflecting this regulatory complexity, the length of each phase varies 
considerably in different kinds of cells but also among genetically and morphologically indistinguishable cells. This article 
addresses the question of how to describe and quantify the mean and variance of the cell cycle phase lengths. A phase- 
resolved cell cycle model is introduced assuming that phase completion times are distributed as delayed exponential 
functions, capturing the observations that each realization of a cycle phase is variable in length and requires a minimal time. 
In this model, the total cell cycle length is distributed as a delayed hypoexponential function that closely reproduces 
empirical distributions. Analytic solutions are derived for the proportions of cells in each cycle phase in a population 
growing under balanced growth and under specific non-stationary conditions. These solutions are then adapted to describe 
conventional cell cycle kinetic assays based on pulse labelling with nucleoside analogs. The model fits well to data obtained 
with two distinct proliferating cell lines labelled with a single bromodeoxiuridine pulse. However, whereas mean lengths are 
precisely estimated for all phases, the respective variances remain uncertain. To overcome this limitation, a redesigned 
experimental protocol is derived and validated in silico. The novelty is the timing of two consecutive pulses with distinct 
nucleosides that enables accurate and precise estimation of both the mean and the variance of the length of all phases. The 
proposed methodology to quantify the phase length distributions gives results potentially equivalent to those obtained 
with modern phase-specific biosensor-based fluorescent imaging. 
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introduction 

The cell cycle is one of the most fundamental processes in 
biology. Through this process, a parental cell transmits to its two 
daughter cells genetic and epigenetic information by accurately 
replicating its DNA and evenly apportioning all nuclear and 
extranuclear contents. The mechanism of cell cycle regulation is 
tailored to ensure accurate cellular content replication, but seems 
to be less constrained by how long it takes to complete this process 
successfully. Several check points exist that ensure that chromo- 
somes are faithfully copied and that the parental cell has enough 
material in order to produce two viable isogenic daughter cells. 
Meeting the conditions of each of these check points takes variable 
time and delays the completion of the cell cycle. Yet, how long the 
cells take on average to complete the cell cycle is an important 
biological property. In unicellular organisms, the average inter- 
mitotic time is a direct measurement of the organism's fitness, 



while in multicellular organisms, the regulation of the rate of cell 
division is critical for development, stem cell maintenance, tissue 
or organ homeostasis, wound healing, and immunity. The 
temporal organization of the cell cycle is therefore under tight 
regulation, likely reflecting a fine balance between accuracy in 
information transmission and speed. 

The average cell cycle time has been estimated at the 
population level by measuring the growth curve of exponentially 
proliferating cell cohorts, under conditions in which cells can be 
counted and cell death is negligible compared to the population 
wide growth rate. Under conditions in which cell counting is not 
possible or in which cell death rates cannot be neglected (e.g., 
homeostasis, immune reactions, cancer growth), indirect estimates 
for the average division time or the average death time are 
typically inferred e.g., through the rate of increase of cells arrested 
in mitosis after administration of colchicine, the fraction of labelled 
mitotic figures after pulse labelling (FLM method), and from long- 
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Author Summary 

Among the important characteristics of dividing cell 
populations is the time necessary for cells to complete 
each of the cell cycle phases, that is, to increase the cell's 
mass, to duplicate and repair its genome, to properly 
segregate its chromosomes, and to make decisions 
whether to continue dividing or enter a quiescent state. 
The cycle phase times also determine the maximal rate at 
which a dividing cell population can grow in size. Cell cycle 
phase completion times largely differ between cell types, 
cellular environments as well as metabolic stages, and can 
thus be considered as part of the phenotype of a given 
cell. Our article advances the methods to quantitatively 
characterize this phenotype. We introduce a novel phase- 
resolved cell cycle progression model and use it to 
estimate the mean and variance of the cycle phase 
completion times based on nucleoside analog pulse 
labelling experiments. This classic workhorse of cell cycle 
kinetic studies is revamped by our approach to potentially 
rival in accuracy and precision with modern phase-specific 
biosensor-based fluorescent imaging, while superseding 
the latter in its application scope. 



term labelling and delabelling time-series of deuterium or 
bromodeoxyuridine (BrdU) tracing experiments [1-3]. For grow- 
ing cell populations these estimates depend on assumptions about 
the shape of the intermitotic time distribution [4] . The latter, when 
analyzed at a single-cell level, e.g., by time-lapse imaging, shows 
significant variability in otherwise seemingly homogeneous cell 
populations. This observation led more than forty years ago to the 
development of one of the first stochastic cell cycle models [5] . 
Smith and Martin proposed at that time that cell's life 
comprehends an A state and a B phase. Whereas the time cells 
spend in the A state was assumed to be exponentially distributed, 
the time cells spend in the B phase was, in this simplest scenario, a 
frxed delay. Experimental validation was provided by time-lapse 
imaging of growing cell cultures, measurements of fraction of 
labelled mitoses and fractions of sibling pairs with age difference 
greater than a specified value [6]. Even though later studies [7-10] 
have shown that the model assumptions do not exactly match 
experimental data, its simplicity and mathematical tractabHity 
makes the Smith-Martin model even today a popular theoretical 
model [6,11]. 

In the last ten years, 5-(and 6)-Carboxyfluorescein diacetate 
succinimidyl ester (CFSE) dilution assays in concert with a whole 
set of advanced modeling techniques [12-14] allowed to estimate 
the average duration, as well as inter-cellular variability in more 
complex scenarios with division time densities in vitro or in vivo 
after adoptive cell transfer. Especially generation structure, 
activation times and generation dependent cell death were 
included in these models and subsequently estimated in the 
context of lymphocyte proliferation. Inter-cellular variability not 
only of division times but also of death times were confirmed 
directly in long-term tracking of single HeLa cells [15] and B- 
lymphocytes [10]. The latter study provided extensive quantitative 
data on the shape of age-dependent division and death time 
distributions which are required to calibrate e.g., the Cyton [16] or 
similar models. A review on these, and alternative stochastic cell 
cycle models is given in [4]. 

At a higher temporal and functional resolution the eukaryotic 
cell cycle is structured into four distinct phases: 1) the Gi phase 
during which organelles are reorganized and chromatin is licensed 
for replication, 2) the S phase in which the chromosomes are 



duplicated by DNA replication, 3) the G2 phase which serves as a 
holding time for synthesis and accumulation of proteins needed in 
4) the M phase, or mitosis, which is marked by chromatin 
condensation, nuclear envelope breakdown, chromosomal segre- 
gation, and finally cytokinesis, which completes the generation of 
two daughter cells in Gi phase [17]. 

Considering explicitly cell cycle phases in mathematical 
models of cell division probably dates back to the discovery 
that DNA is replicated mainly during a specific period of the 
cell cycle. Already in their seminal paper. Smith and Martin 
related the A state to the Gi phase and the B phase to the S, G2, 
M and possibly to some part of the Gi phase. Subsequent 
studies that explored phase-resolved cell cycle models, major- 
itarely rooted in the field of oncology and cancer therapy, 
include [18-25]. As in the present work, most of these studies 
relied on flow cytometry (FACS) data generated by labelling 
selectively cells that are synthesizing DNA using nucleoside 
analogs (e.g., BrdU, iodo-deoxyuridine (IdU) or ethynyl- 
deoxyuridine (EdU)), together with a fluorescent intercalating 
agent to measure total DNA content (e.g., 4,6- diamidino-2- 
phenylindole (DAPI), and propidium iodide (PI)), in order to test 
the model assumptions and draw conclusions about the cells and 
conditions under consideration. 

Here we present a simple stochastic cell cycle model that 
incorporates temporal variability at the level of individual cell 
cycle phases. More precisely, we extend the concept underlying 
the Smith-Martin model of delayed exponential waiting times to 
the cell cycle phases. We first demonstrate that the model is in 
good agreement with published experimental data on inter-mitotic 
division time distributions. We then show, based on stability 
analysis, that phase-specific variability remains largely undeter- 
mined when measurements are taken on cell populations under 
balanced growth (i.e., growth under asymptotic conditions in 
which the expected proportions of cells in each phase of the 
cycle are constant). We prove that by properly measuring 
proliferating cells under unbalanced growth, one can with at 
least three well placed support points, assuming noise-free 
conditions, uniquely identify the average and variance in the 
completion time of each of the cell cycle phases. When 
comparing our model with two experimental data sets obtained 
from conventional pulse-labelling experiments of distinct pro- 
liferating cell lines, we find that, while the kinetics extracted 
from these experiments are well approximated by the predic- 
tions of the proposed model, the information content is 
insufficient to determine accurately all the parameters. Finally 
we propose a modification of the prevailing experimental 
protocol, based on dual-pulse labelling with BrdU and, for 
example, EdU, that overcomes this shortcoming. 

Results 

Model definition 

The eukaryotic cell cycle is defined as an orderly sequence of 
three phases distinguished by cellular DNA content, termed Gi , S 
and G2M. A dividing cell is supposed to proceed, under this 
minimalist view, from one phase to another in a fixed order, until 
reaching the end of G2M phase. Here it completes cytokinesis 
generating two genetically identical daughter cells that are by 
definition in Gi phase (Fig. 1 A). We assume that the completion 
time of any phase (i.e. the time lapse between the entry to and exit 
from that given phase) is a random variable T, which is distributed 
according to a delayed (or shifted) exponential density function 
(Fig. 1 B), 
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Figure 1. Stochastic cell cycle model. A: Scheme of the proposed 
cell cycle model with three phases Gi, S and GiM. The dashed border 
between the G2 and the M phase indicates that the G2 and M phase 
are pooled into a single phase. The random time t a cell needs to 
complete the processes associated to each of the phases, follows a 
delayed exponential distribution with specific parameters a. and \i for 
each phase. B: Delayed-exponential completion time distribution 
density with parameters a and /i. C: Best fit of the complementary 
cumulative distribution Rj to the fraction of undivided cells after birth 
obtained by time lapse cinematography [5] of slow and fast dividing 
cell lines. D: Best fit of defined by Eq. 4 (solid line) to inter-mitotic 
time distribution density measured by long-term video tracking of m 
vitro proliferating B-cells [10]. The data in C and D were read from the 
graphs in the original publications ([5] and [10] respectively). 
doi:1 0.1 371/journal.pcbi.1 00361 6.g001 



(1) 



where a is the reciprocal of the rate of the exponential (measured 
in time units) and /? is the frxed delay (in time units), and H 
denotes the Heaviside step function whose value is zero for 
negative argument, i.e., for T</?, and one for positive argument. 
Notice that with a slight abuse of notation we denote here the 
random variable (subscript of density function /) and the value it 
assumes (the argument of the function f-^) by the same symbol T. 
This will allow us to denote the probability density function and 
the cumulative probability distribution of the random variable x 
by fx and F-^ respectively, and to define the complementary 
cumulative distribution = 1 — Fj. The delay /J in Eq. 1 'ensures' 
that a cell that enters a specific phase wiU remain therein for at 
least P time units (e.g. hours) before proceeding to the next phase. 
Besides this fixed minimal time /?, additional less predictable 
effects that affect the completion of the processes associated to a 
phase are assumed to be exponentially distributed with both mean 
and standard deviation given by oc. The phase specific mean 
completion time, denoted in the following by T is then oc + p with 
standard deviation a and coefficient of variation a/i. The Laplace 
transform of Eq. 1 is given by 



a„{/;(t)}= 



1 +C!(Co' 



(2) 



where tu is the transformed variable corresponding to the time 
lapse 1. The temporal organization of the cell cycle is defined by 
the vector of phase-specific completion times, t = {tg, ,ts,tg,m}, 
which in turn depend on the parameter vectors a = {tzo, ,as,aG2M} 
and )? = {/^Gi '/^s>/^G2m}- The cell cycle length, understood as the 
time lapse between the entry into Gi until exit out of G2M, is the 
random variable 7" = Tg, +Ts + TG2M- Its probability density 
function is the convolution of the three underlying delayed 
exponential distributions and corresponds to the delayed hypoex- 
ponential distribution. Explicit expressions can be computed using 
the inverse Laplace transform £ ^ ' of the product of the Laplace 
transforms of the three densities given by Eq. 2, i.e.. 



fT(T) = C, 



/ 1 + a, CO 



with i = Gi, S, G2M. 



(3) 



In case that aU entries in a are distinct, we get 



fT(r)=J2 



( \ 



H(T-E), (4) 



in which the indices i and j iterate over the three phases and B is 
the sum of the elements in p. 

In Fig. 1 B we plot the shape of the phase specific completion 
time distribution f defined by Eq. 1, which illustrates that the 
probability for a cell to complete a given phase in less than time 
units is zero under this model. A graphical representation of the 
cell cycle model is provided in Fig. 1 A. Notice that each phase can 
have distinct parameter values a and ji for the completion time 
distribution. 

As a first validation, we compared the empirical frequency of 
undivided cells as a function of time after 'birth' (reported by [5]) 
with the respective probabihty according to the model X—Fj, 
which we denote as Rt (Fig. 1 C). As a second test, we fitted the 
cell cycle length density fr given by Eq. 4 to data extracted from 
video-tracking of m vitro proliferating B cells [10]. The delayed 
hypoexponential distribution fx (shown in Fig. 1 D), but also the 
delayed log-normal and the delayed gamma distribution (not 
shown) with parameter values proposed in [10], reproduce closely 
the measured division time histogram. While the two latter depend 
on three parameters each, the hypoexponential distribution 
depends on six parameters, that remain largely undetermined 
given this kind of data. 

Balanced growth 

A proliferating cell population that obeys the probability model 
specified in the previous section can be represented by a non- 
IVlarkov multidimensional random process, whose evolution 
depends on its history. There exist an infinite number of possible 
histories or realizations of the population size dynamics N(t). We 
focus here on a specific important subset, namely those under 
balanced growth. Under balanced growth a cell population grows 

exponentiaUy E[N{t)\ = E[Na^ {t) + N^{t) + A^g^m (01 oc e'" with 
mean growth rate j.i and constant mean proportions of cells in 
the three phases « = {«Gi i^Si^Gim}; where e.g., 

nG^= E[Na,{t)/{NG,{t) + N^(t) + Na^M{t))]. The expectation 
operator E\\ is defined over all possible realizations of the process. 

We will now derive explicit expressions for mgi , 'is and «g,m 
and a transcendental equation that defines ji, the growth rate. A 
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first step in obtaining the constant fi'equencies of cells in each of 
the phases consist in computing the ratio between the cells that 
complete a given phase and the total number of cells inside the 

same phase at time t. This phase-specific quantity, denoted here 
by y, represents the asymptotic efflux rate constant, which will be 
useful, as we will see, to construct a transition probability matrix 
Q. The latter will enable us to employ methods from hnear algebra 
to solve the steady state condition. 

Suppose for example that a cohort of cells entered a given phase 
at time Then the density of cells leaving this phase at time t„ut 
will be fzitout — tin)- Similarly if a cohort of cells entered this phase 
at time then a proportion R^^t — tin) will remain in it until time 
t. 

Recalling that the influx of cells into a given phase is 
proportional to e^' and that Rzit) is the complementary 
cumulative distribution of T, (1— i^x(0)> which is Laplace 
transformed to £o){i-—F^{t)} = {l—£o){fiit)})/(o, we integrate 
over all past entries and finally take the ratio to obtain 



(5) 



(6) 







" 2 xO-I/c/GiVgi /*+!))) " 


Ms 




1 — "Gi —nojM 


«G2M 




_-lx(l-/G2M''(aG2^^+l))_ 



(10) 



The uniqueness and existence of a dominating positive real root 
ultimately motivates our focus on balanced exponential growth, as 
any immortal proliferating cell population with sufficient nutrients 
and space wiU eventually enter this stationary phase. The time it 
takes, either starting with a single cell or a synchronized cell cohort 
to enter this state depends on the cell cycle parameters. The 
exponential growth rate fi is the unique real positive root of the 
characteristic equation det(Q — /xl) = 0 which writes as 



/|3(2-n,eft^(l-Ha,/i)) 



= 0. 



(11) 



It is easy to see that the denominator in Eq. 1 1 is always 
positive. To determine a non- trivial fi it remains to solve the 
transcendental equation in the numerator 



2- neft"(l-ho(,/j) = 0. 



(12) 



(7) 



While the second equality is a consequence of tiic definition of 
the Laplace transform, the third equahty follows by substituting 
Cfi{fx{T:)} using Eq. 2. For a phase without a delay, i.e., p = 0, the 
last expression simplifies to the familiar mass action principle, 
where the transition probability is directly proportional to the 
decay rate 1/a. Assuming that i:ells are immortal and recalling 
that division occurs as cells proceed from G2M to Gi we build up 
the transition probability matrix as follows 



Q-- 



0 



0 



2yG, 



2M 

-7s 0 

Ys ~yG2M 



(8) 



The balanced growth condition can now be formulated in 
matrix form 



(9) 



where the growth rate is an eigenvalue of Q and the 
proportions vector n = {«Gi> "s, KGzm} is the corresponding 
eigenvector. It can be shown that there exists a single dominating 
real pc)siti\ (' cigcm alue for Q (see Materials and Methods) whose 
associated normalized eigenvector is 





' «Gi 




" «Gi 


Q 






«S 




«G2M 




"02 M 



Numericcil solutions to this equation can be computed using 
e.g., the Newton-Raphson root finding algorithm, with fast 
convergence if the initial value is set to ^0= log (2)/ 2", where T 
is the average cell cycle length, i.e., the sum of the elements in 
t = {tgi,ts,tg2m}- This first guess is a naive estimate for pi 
assuming that cells divide according to a deterministic division 
time identical to the average of the hypoexponential density 
defined in Eq. 3. 

Learning from cell frequencies measured under balanced 
growth 

The predicted fractions of cells in each of the phases can be 
compared to frequencies extracted experimentally from bivariate 
analysis of cell populations transiently (exposed to nucleoside 
analogs and subsequentiy examined both for the intensities of the 
signals due to incorporated nucleoside analog and total DNA 
content [26] (e.g. the so called BrdU-DAPI staining dot plot). The 
question that we want to address in this section is: What can 
potentially be learned about the parameters of the model, given 
this type of experimental data? By definition, the measured 
frequencies will sum to one, and therefore we have for three 
populations effectively only two equations but six model param- 
eters. This makes it impossible to identify all the parameter values, 
irrespective of the number of samples we take. It is however 
possible to derive analytical expressions for the upper and lower 
bounds for both the parameters and the average completion time 
of each phase. 

Consider the experimentally determined frequencies, denoted 
by « = {«G|, WGjm}- Substituting the vector n by H in Eq. 10 
and solving for each phase specific parameter a, we obtain 



(13) 
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where k is a phase specific element of the vector 
2 2-«G, 



2-KGi ' 1+"G2 



1 +nG2M 



(14) 



The phase specific parameters a and p, respectively the 
reciprocal rate and delay, are by definition greater or equal to 
zero. These conditions propagate into Eq, 13 which allows us to 
specify boundaries for a and p. First notice that a is, for each 
phase, a monotonicaUy decreasing function of p with a maximum 
(k—1)/h at p = 0 and a zero crossing at p=\n(K)/fi. The 
maximum and the root represent the upper bounds for a and P 
respectively, while the lower bounds are zero for both. We thus 
have for each phase 



a6[0, (k-1)/ju] and ;S6[0, ln(K)/^j]. 



(15) 



The mean phase-specific completion time, T, the sum of the 
reciprocal rate a and the delay p, is also bounded, with an interval 
given by 



T = (a+p) e [ln(K)//j, (k-1)//i]. 



(16) 



This result is derived from the fact that (a -I- P) is concave having 
its unique minimum at /? = /m(k) / /.(, which follows from setting the 
derivative 5^(a + /?) = 1 — Ke^'*'' to zero. This implies that (a+P) 
is a monotonicaUy decreasing function in the interval 
P e [0, In (A)/ fi] with the corresponding extrema specified above. 
It is important to note that the intervals defined by Eqs 13-16 
depend on the average growth rate n which is in general not 
known. Formally if one specific pair of parameter vectors a and P 
explains the measured frequencies with growth rate ji, the scaled 
parameter vectors ca and cfi mimic equally well the same data for 
arbitrary positive c, however with a reduced growth rate fi/c. This 
can be easily verified by substituting these expressions in Eq. 10 
and Eq. 12. The direct consequence is that fi remains undefined. 
However for the relative average time a cells spends e.g., in Gi 
phase (kgi +Pqi)/T the growth rate cancels out. 

Using the fact that Ke] 1,2[ and the appropriate series expansion 
for the natural logarithm, the widths of the intervals bounding a, p 
and (a + p) for each phase can be written as: 



1 



1. 



(-1) 



(K-iy, 



1^1=1 ' 



(17) 



i=2 



I 



From this it is straight-forward to show that iv^ > ii'/j > H'^^/j. 
This imphes that by using measurements of the phase-specific 
stationary cell frequencies to infer the phase-specific completion 



times T results in estimates of the mean value a-^P that are more 
precise than the estimates of the standard deviation a. Notice that 
the width of the intervals can be interpreted as a naive lower 
bound for the uncertainty about the respective parameter values. 
For the two data sets analyzed in this article (see details in next 
section), we computed the intervals for the phase-specific standard 
deviations that were on average ~ 10 times wider than the 
intervals for the expected phase-specific completion times Wa,Jrfi- 

Transient unbalanced growth 

Balanced growth analysis does not allow to distinguish between fixed 
(a = 0) and purely exponentially distributed {P = 0) completion times 
even if is known. This follows from Eq. 1 5 because possible values for 

the standard deviation a include 0 and (k— 1)//J, and the latter 
requires, according to Eq. 16, the delay p to be null. 

The incapacity to resolve the values of a and p is overcome if 
one selects and follows a subpopulation within which the 
proportions of cells in each phase are transientiy different from 
the balanced growth proportions. Consider a simple thought 
experiment that consists in taking a population under balanced 
growth and labelling all the cells that are in a specific phase, say o, 
which can be either Gi, S or G2M. Initially all the cells are in the 
same phase o, but as time passes by the labelled cells progress 
through the cell cycle and eventually distribute over the three 
phases. The labelled cell subpopulation which is initially not 
balanced will return asymptotically to balanced growth conditions, 
restoring the corresponding proportions of cells in the three 
phases. We refer to this transient dynamics of a selected 
subpopulation as transient unbalanced growth. It turns out that 
measuring the transient dynamics of this subpopulation yields 
information that potentially allows to distinguish between a fixed 
and a purely exponentially distributed phase completion time. 
More specifically, a mathematical proof will show that taking 
samples at three well chosen time points (support points) permits 
under ideal conditions accurate estimation of the average and the 
variability in the time required to complete the phase o. 

The initial average fraction of cells in phase o which are 
selectively labelled at time to is determined by Eq. 10. To predict 
when the labelled cells wiU have completed o, we need to specify 
when they entered this phase. For the time before labelling the 
average influx into o is proportional to e*". For the time after the 
labelling, because by definition all labelled cells entered phase o 
before to (otherwise they would not be labelled 'as being in phase 
o'), the entry of cells is zero. Hence, the average influx to the 
labelled subpopulation is proportional to H{to — t)e'", where H 
denotes the Heaviside step function. Let us assume that within the 
subpopulation of labelled cells and their progeny one could 
identify how many phases a cell or a cohort of cells went through 
since the labelling event, and let P count the number of phases 
since labelling. 

In close analogy to expression Eq. 5 we compute the time- 
dependent exit-rate density distribution for cells with P = 0 as 



f_^^Hito-x)e^%„{t-x)dx 
'0^1 j^o^ Hito-x)et'-R,„{to-x)dx 



We^^%,{t-x)dx 
fo^ei'^Rr„ito-x)dx 

HC^{f,„iT+t)} 



(18) 



1-£,{/;„(t)} 



for to=0. 
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where, for convenience, we interpreted and will interpret in the 
following o both as a phase and a phase index. As before, the third 
row follows from the definition of the Laplace transform setting 

?o = 0.. On the left-hand side, the arrow from 0 to 1 represents the 
transition from the initial phase o (P = 0) to the next phase (P= 1), 
corresponding to the completion of the initial phase o. In contrast 
to Eq. 5, the denominator accounts for the cells that entered or 
initiated phase o sometime in the past, and did not complete this 
phase until the instant of labelling tg (and not at time t as in Eq. 5), 
while the numerator, except for the altered average influx, remains 
unchanged. 

After computing £n{f^{i + 1)} and substituting ^^{/^(t)} using 
Eq. 2, Eq. 18 yields for t>to 



( 



fie 



\(aofi+l)efoi^-l 



t>to + Po 



(19) 



It follows that the accumulated average cell flux that at time t 
has completed o and progressed to the next phase is given by 



0^1 



(0 = 



{x)dx. 



(20) 



which for r->oo approaches one, reflecting the fact that all cells 
will eventuaUy complete o. 

The Laplace transform of Eq. 20 writes as 



def 



p mod 3) 

4'i+(pmod3) 
\4'2 + (pmod3) 



if 0 = Gi 

ifo = S 
if o = G2M 



where mod is the modulo operation, and 0 = {Gi, S, G2M} is 
a vector of cell cycle phase indices. The function q(o,p) thus 
returns, for increasing p, in a cyclical fashion, the ceU cycle phase 
indices, starting with o for p = 0. Notice that Eq. 21 corresponds to 
Eq. 22 for o = S and P=l. 

Analytical expression for Eq. 22, although solved relatively 
easily with modem algebra software, can become quite cumber- 
some for values of P larger than six. In our case, deriving the 
expressions for /> up to a value of five was suflicient to simulate the 
experiments. 

Because we want to compare the model predictions with 
experimentaUy measured cell frequencies, more interesting than 
the accumulated fluxes are the expected proportions of cells inside 
each phase over time. These can be computed using Eqs 20-22, 
closely following the methodology outiined in [11,12]. For the 
fraction of cells initially in phase o we have 



x(l- 



(23) 



where the lower index 0 in «q(0 indicates that this expression 
describes cells which completed zero phases since to. The first term 
on the right hand side corresponds to the fraction of cells in phase 
o at to divided by e^', which accounts for the total population 
growth during the same interval. The second term stands for the 
fraction of cells that remained in phase o up to time t relative to 
the initial number of cells in this phase. By evaluating the integral 
in Eq. 20, substituting in Eq. 23 and letting as before, without loss 
of generality, the time of partition /q be zero, we get for t>0 



where w is, as before, the transformed variable corresponding to 

t. 

Within a cohort of cells isolated for instance in S phase, i.e., 

o = S, the accumulated average cell flux out of the subsequent 
G2M phase can then be derived recalling Eq. 2 and using the 
properties of the inverse Laplace transform as 



rl^(t)=Ci4c^{rl^(t)} 



l + aG-)M<» 



(21) 



For an arbitrary cell cohort originally in o, the accumulated 
average flux, completing P phases and entering the iP+ 1)''' phase 
since isolation, can be written in general as 



r" (t) : 



-LT^IlAV" (Olxll- \, (22) 



( 



<(0= 



1- 



(l-l-a„/i)e^o''-l 



V (l-l-a„/i)e^<'''-l 



(24) 



Expressions for cells initially in S, Gi or G2M phase can be 
obtained by substituting o by the respective phase. 

If there were no cell division (i.e., /^ = 0) we could readily obtain 
the average fraction of cells that completed P phases at time t as 
the difference between the cells that entered the P'* phase, i.e., 
Tp^i^pCO) ^rid those that left it, i.e., r^^^^j(f), divided by e'". To 
account for cell division, we need to multiply this difference by an 
additional term Ip, which increases by a factor 2 each time cell 
cohorts make a transition from G2M->Gi. This term is defined, 
for each case, as follows: A^' =2LfJ, 4 = 2L^J and X°/^ = l^-^\ 
where the brackets in the exponent represent the floor operator. 

In general we get for all consecutive phases for cells initially in 
phase o the relatively manageable expression 



in which q{o,p) denotes a function which returns an appropriate 
phase index. For />eN and oe{Gi, S, G2M} it is defined as 



«;(o=- 



gUt 



xcr" (0- 



T" (0). 

p^(p-i-i)^ 



(25) 
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As for Eq. 24, the resulting solutions are defined as piecewise- 
continuous functions in time. Also notice that most expressions in 
this section can be written in more compact, but less intuitive, 
vector form, by dropping the initial phase index o and using bold 
vector notation as before. 

Learning from cell frequencies measured in transiently 
unbalanced growing subpopulations 

In this section we will show that data from the transient kinetics 
generated by our thought experiment allows to accurately estimate 
the average and the variability in the individual completion times. 
The proof is based on the analytical expressions derived in the 
previous section, and also on the assumption that the kinetics are 
acquired under the ideal conditions of large population sizes and 
no measurement errors. The latter condition, although clearly 
unrealistic, can always be approached in practice by increasing the 
number of samples at each support point. 

For the sake of generality, consider a subpopulation of cells 
that are in an arbitrary phase and are labelled at = 0. 
Assuming that the 'label' does not in any way affect the cell 
cycle of the cells, the parameters a and of the labelled 
subpopulation are the same as those of the full population under 
balanced growth. Under these conditions, we can obtain a using 
Eq. 13 and Eq. 14 with the fractions ii of the full population 
observed at time to. Substituting a in the upper row of Eq. 24 
and solving for fi to find 



l0g(K«)- log(« + (K:-l)Ko(?]0,/i[)) 



(26) 



where t]o,fi\ denotes an arbitrary time point that lies in the 
interval ]0,jS[, h =«o(0) and n o(t) is the experimentally 
determined equivalent of Eq. 24. This shows that the balanced 
growth rate n is fully determined by only two support points, one 



immediately after the partition at ? = 0 and a second at an 
arbitrary ?]o,/?[- This also makes clear that placing more support 
points in the interval ?]o_/;| does not increase knowledge about fi 
nor the parameter values, under ideal conditions. Importantly the 
uncertainty about the phase-specific variability discussed in 
previous sections remains. 

By replacing the same expression for a in the second row of the 
right-hand side of Eq. 24 we get 



■) 



n o(%oo]) = 



ne exp(;8;i)-K 

(;c-l)/(K-exp(iS/i)) 



(27) 



After experimentally acquiring yU and the phase specific K and 
h o{t\p^ oo]), this expression will depend on a single unknown jS. 
One can show that Eq. 27 is solved by a unique jS. This follows 
from the fact that the right hand side of Eq. 27 is a monotonically 
decreasing function in )Se[0, In (k)//^] with corresponding values 



lying in the interval [h exp ( - 



■ 1),0] while the left hand 



side is positive by definition. Substituting the solution for [i into 
Eq. 13 yields the remaining parameter vector a. 

Taken together this proves that in theory samples of the three 
cell cohorts Gi, S and G2M taken at three support points, a first at 
? = 0, a second at 0<?< min(jff) and a third at t> max (J?) are 
sufficient to determine all the parameters of the model. 

Conventional single pulse-labelling assays 

The thought experiment analyzed so far, although conceptually 
simpk", poses a series of experimental challenges, that make a one- 
to-one realization difficult. The technical difficulties lie mosdy in 
initially separating the cells according to their phase and in 
following these cells as they enter the subsequent phases. A widely 
used technique, namely DNA-nucleoside-analog pulse-chase 
labelling experiments, generates nevertheless to a certain extent 
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Figure 2. DAPI-BrdU pulse-chase labelling FACS data. Samples taken at several time points after pulse labelling proliferating U87 human 
glioblastoma cells with BrdU. The four gated populations are f'", fo^^' ^g, ^^^^ f'** which are defined precisely in the main text. Briefly, the subscript 
indicates the phase at the instant of labelling, while the superscripts 'u', 'lu' and 'Id' refers to cells 'unlabelled', 'labelled and undivided' and 'labelled 
and divided', respectively. The data was generated as described in the Experimental iVlethods section. 
doi:1 0.1 371/journal.pcbi.1 00361 6.g002 
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comparable data. The latter achieves the initial phase-specific 
partitioning by exposing during a short time window proliferating 
cells with a nucleoside analog (e.g., BrdU, IdU or EdU) that gets 
selectively incorporated into the DNA of cells that are actively 
replicating their genome. Measuring subsequently by FACS 
simultaneously the DNA content and the amount of incorporated 
nucleoside analog per cell permits to discern the three phases Gi , 
S and G2M immediately after the pulse. In addition, due to the 
permanent staining property of the nucleoside analogs, it is 
possible to follow, up to a certain degree, the labelled and 
unlabeUed cell cohorts over time. Several dies, such as Hoechst 
33342, the dihydroanthraquinone analog DRAQ5, DAPI, and PI 
are commonly available to stain DNA content in cells [27], and 
can be used in combination with nucleotide analogs. 

In theory, this method would largely correspond to the 
hypothetical experiment that we analyzed so far. In practice 
however, the overlap of the subpopulations in the FACS scatter 
plots prevents the exact determination of the frequencies of cells 
described by Eq. 24 and Eq. 25. For example labelled ceUs that 
have completed the S phase but remain in G2M phase are 
indistinguishable from those that did not complete the initial S 
phase yet. As has been reported previously, only four different sub- 
populations can be identified with reasonable accuracy [26]. 
These are: 

• f'": labelled undivided cells which at time of labelling (?o) were 
in S phase («q +«^) 

• fQ,]y[: unlabeUed cells that were in G2M phase at /q (w^"'^) 

• f''': first generation progeny of labelled cells which were 
initially in S phase {Ylp^i'^p) 

• : unlabeUed ceUs and progeny of ceUs that were in Gi at /q 
accompanied by the progeny of fG,|vi and f 
(l-f'"-f^....-f''') 



where the corresponding populations in our thought experiment 
are indicated in brackets. This shows that computing Eq. 25 up to 
p = 4 is sufficient to describe a complete in silico BrdU pulse 
labeUing experiment. The reason is that, using current protocols, 
fluorescence of labelled ceUs becomes indistinguishable from 
background as soon as the cells divide a second time. In other 
words, cells that leave population f'"* by dividing a second time join 
population fg^ (see Fig. 2). For the experimental data, analyzed in 
the next section, the fraction of labelled cells that completed two 
ceU cUvisions during the 1 2 hours time frame of the experiment is 
negligible. 

The population fQ,M is the only sub-population that matches 
directly the type of data considered before and its temporal 
evolution follows as such Eq. 24. The remaining three populations 
in contrast represent mixtures of ceU cohorts whose kinetics could 
be described individually by Eqs 24-25. 

Learning from single pulse-labelling data 

By analyzing two data sets from samples of BrdU single pulse- 
labelling experiments, we tested the model and the effect of 
population intermixing on the identification of the model 
parameter values. The two cell lines considered were in vitj'o 
cultured U87 human glioblastoma cancer ceUs (for detaUs see 
Materials and Methods) and in vitro cultured V79 Chinese 
hamster cells (courtesy G. Wilson). We wUl refer to these data as 
the U87 and the V79 data sets. Both data sets consist of samples 
taken from asynchronously dividing cell populations at several 
time points after a single BrdU pulse, with sample sizes ranging 




0 12 



Figure 3. Model based parameter estimation. A: Best fit of the 
model predictions (lines) to experimentally determined cell fractions 
after BrdU pulse labelling (dots). U87: In vitro cultured U87 human 
glioblastoma cancer cell line (three replicates). V79: In vitro cultured V79 
Chinese hamster cells (single replicate) (courtesy G. Wilson). Best fit 
parameter values used to compute model predictions 
(U87: a = {3.2,3.9,3.4},/( = {5.7,4.1,2.1}, V79: « = {1.6,1.1,0.5},/? = 
{1.4,7.8,1.9}, units are hours). B: Approximate ML regions for the 
parameters a. and /j associated to each phase (gray: Gi, red: S, green: 
G2M). C: Bayesian bi-variate 99%-credibility regions for the parameters 
a and [i for each phase. Arrows indicate point estimates and the dashed 
lines delineate the information that could have been gained in our 
thought experiment under noise-free conditions from two support 
points, one at / = 0 and a second at t< min(y!). The U87 data set was 
generated as described in the Experimental Methods section. The V79 
data set was a kind gift of G. Wilson. 
doi:1 0.1 371/journal.pcbi.l 00361 6.g003 



from 5000 to 50000 cells each. Data points represent 
simultaneous measurements of BrdU as well as DAPI or PI 
(DNA content) in a single cell by fluorescent activated cell 
sorting. 

As a preliminary test we minimized the residual sum of squares 
(RSS), i.e., least-squares fitting, of adequate mixtures of Eq. 24 
and Eq. 25 to extracted frequencies at different time points after 
the pulse. We found that, for properly chosen parameter values, 
both data sets were reasonably weU approximated by the model 
predictions (Fig. 3 A). 

WhUe this indicated that the model captured some of the 
relevant temporal characteristics of ceU cycle progression, a 
subsequent analysis revealed that an infinite number of different 
parameter combinations fitted the measured frequencies with the 
same minimal RSS (not shown). This implies that there exist, 
given the available data, no single best-fit parameter combination, 
but a whole region in parameter space that can explain the data 
equally weU. 

When we then interrogated the same data by approximate 
maximum likelihood (ML) estimation, using a simple ad hoc 
likelihood function (see Materials and Methods), we found again 
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that relative large regions in parameter space mapped to the same 
ML (see Fig. 3 B). It turned out that these regions were entirely 
superimposed onto the lines defined by Eq. 1 3 and Eq. 26 (dashed 
lines). These lines define what could have potentially been learned 
in our thought experiment with only two support points, one at 
7 = 0 and a second at ;<min(j8). hi both experiments, ML 
parameters associated with the Gi phase were spread out almost 
everywhere along these lines (Fig. 3 B, gray regions). Parameters 
related to the S phase were more concentrated but still in the case 
of the V79 data a substantial region of ML estimates were 
observed. Finally the region for the G2M phase parameters 
approached that of a point estimate for both data sets. 

The spread of the ML estimates suggests that even in the ideal 
case of large population size and noise-free data, the specific 
choice of the support points in these experiments does not allow to 
determine uniquely neither the delay nor the standard deviation 
for all the phases. In contrast the average completion time for each 
phase and the total division time can be estimated with relatively 
high precision. 

To better quantify the uncertainty of these estimates, Bayesian 
99% credibility regions (CR) were computed by the Markov chain 
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Figure 4. Dual pulse protocol. A: Simplified schematic representa- 
tions of the protocols corresponding to a conventional single pulse 
labelling with one nucleoside analog (e.g., BrdU) and a dual pulse 
labelling experiment with two different nucleoside analogs (e.g., BrdU 
together IdU or EdU). B: Artificial staining of single-pulse labelling data 
(for original data see Fig. 2), showing eight of the nine subpopulations 
that could potentially be identified with double-pulse labelling. Notice 
that the four population f'", f''', f'l^^ and f;^;,^ that can be followed by 
the conventional protocol, have each been subdivided according to the 
cell cycle phases. The naming convention for the populations is as 
follows: the superscript (/» = 'labelled undivided', W = 'labelled divided', 
w = 'unlabelled') indicates whether the population is labelled and 
whether it has divided since the time of the first pulse; the first and 
the second subscript (Gi, S, G2M) stand for the phase in which the 
population was at the time of the first and the second pulse 
respectively. Double subscripts are used only when necessary. 
doi:1 0.1 371/journal.pcbi.1 00361 6.g004 
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Monte Carlo method (MCMC) using the same likelihood function 
as before (Fig 3 C). CRs followed mainly the same trends as the 
regions observed in the ML estimates, covered however as 
expected a larger volume. An exception was the 'blown up' CR 
of the S phase parameter for the U87 cell line, for which the ML 
estimates wrongly insinuated a well defined point estimate. 

In Table 1 we summarized the obtained Bayesian summary 
statistics. One can see that the intervals for the average duration of 
each phase a +)? are narrow compared to those for the individual 
parameters a and p. In both cases tiie data allows for a 
deterministic S phase (a = 0), while for the U87 data set variability 
in G2M is a necessary characteristic to reproduce accurately the 
data. Notably, when contrasting the two cell lines, are the short Gi 
phase of Chinese hamster cells and the approximately two times 
more extended G2M phase of the human glioblastoma cell line. It 
is out of the scope of this paper to interpret or relate these 
differences to cell line specific conditions. More importantly in this 
context is the fact that the information of the analyzed data is too 
sparse to narrow down all the parameter values even under noise- 
free conditions. 

Redesigned dual pulse-labelling assay 

The information extracted from the U87 and V79 data sets is 
apparently insufficient to pinpoint all six parameters related to the 
three phases of our simple cell cycle model. This is disappointing 
especially because the number of support points largely exceeds 
the three ideally required, and the support points seem to include 
at least for the U87 data set one at t = Q, a second at 
0<?< min(jS) and a third at t> max(jS). 

A potential explanation for this poor resolution in the estimates 
is the previously mentioned intermixing of the cell population 
clusters in the BrdU versus DAPI scatter plots compared to the 
ideal conditions discussed earlier. The cluster overlap in the data 
makes it impossible to measure directly the frequencies of most of 
the populations, including the cell cohorts described by Eq. 24. 

In order to approach the conditions assumed in the thought 
experiment by avoiding the loss of information caused by the 
intermixing, we devised an extension of the current single pulse 
protocol, which places a second pulse immediately before 
measuring or fixing each sample (see Fig. 4, top). The second 
pulse is expected to expose the cells with a further nucleoside 
analog that can be distinguished from the first one by FACS. 
Depending on the cell cycle kinetics and the length of the 
measuring period, the additional pulse increases the number of 
classifiable populations from four up to nine distinct populations. 

To appreciate the additional populations identified by double 
pulse labelling, data from a single pulse-chase labelling experiment 
was artificially colored, to mimic the expected FACS output from 
proliferating cells labelled according to the protocol described 
before. In Fig. 4, besides the gates defining the populations f'", f''', 
and fG,M, cells that have incorporated the second label are 
drawn in red. For the time immediately after the pulse (i.e., / = 0), 
no extra information is gained by the second pulse. However, 
already two hours later, one additional population can be 
discerned. Twelve hours after the first pulse, seven population, 
instead of three, can be recognized. Thus by resolving the four 
initial population according to the cell cycle phases, it is possible to 
measure the kinetics of nine subpopulations (f'"^{fs'5f&M}' 

fid Sf/d fid rid \ rn fru fU fU \ i 

^ ~*'l'^G,' ' 'g,MJ' 'g, t^Gl.Gl' 'g1,S' 'gI,G2MJ ' ^""^ 

^a2M~*{^a2M})- Because all these kinetics depend on the cell 
cycle parameters, each of them can in principle tell us something 
about the phase completions times. However some information is 
redundant. For example if and f g'' are measured, then Tqj 5 is 
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Figure 5. Analysis of simulated dual pulse labelling data. A: 

Average kinetics of unlabelled (dashed line) and labelled cell 
cohorts (colored lines) were computed from Eq. 25, using ML para- 
meter estimates from the U87 and the V79 data sets 
(U87: a = {7. 1,3.9,3.6}, )? = {1.9,4.1,2.1}, V79: a = {1.4,2.3,0.5},)? = 
{1.4,6.5,2.0}, units are hours). Support points and repeats were cfiosen 
according to the real experiments. Multinomial noise was added, 
mimicking the residuals found in the original data sets (see the 
Computational Methods section for more details). Finally, model 
solutions (lines) were fitted to the synthetic data sets (triangles). 
Best fit parameters (U87: a={7.5,3.2,4.5}, ^ = {1.7,5.3,2.0}, V79: 
a = {1.2,2.5,0.4},y? = {1.4,6.2,2.1}, units are hours) B: ML parameter 
estimates from simulated data. All ML regions converge to point 
estimates (arrows). Squares indicate parameters used for generating the 
data (see A). C: Bayesian bi-variate 99%-credibility regions for the 
parameters a and (} for each phase, based on the artificial data. 
doi:1 0.1 371/journal.pcbi.l 00361 6.g005 

defined by the total fraction of cells in S phase, because 
«s =fs' + fs' + fGl S- Similarly from fQ^y[, fG,M o'^^ can deduce 
^Gi GtM +fG,M' '^y knowing the frequency of cells in G2M phase. 

Double-label experiments using pairs of nucleoside analogs like 
BrdU, IdU and EdU, also in combination with radioactive 
tritiated thymidine ([^H]), have been explored in several cancer 
cell proliferation studies [19,28-31]. In recent years, dual pulse 
experiments using BrdU in combination with EdU have become 
more common. Studies relying on this method estimated changes 
in DNA replication, inferred mitochondrial DNA bio-genesis and 
stained proliferating cells in the bone marrow in vivo [32-34], in 
general with the aim to increase the statistical power of the 
conventional methods. 

To assess if the latter method would allow quantifying more 
accurately and precisely the parameters of the model, we 
generated in silico data mimicking the output of a hypothetical 
dual pulse experiment using Eq. 24 and Eq. 25 (see Fig. 5 A). We 
found that by employing the redesigned protocol with the same 
replicates and time points as in the corresponding data sets, we 
could reduce the regions corresponding to the ML up to point 
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Figure 6. Robustness of parameter estimates to empirical phase duration distributions that are not delayed exponential functions. 
A-B: Least-squares fitting of histograms predicted from a hypoexponential distribution with two decay and one delay parameter {ao, to 
measurements of phase durations using fluorescent biosensors [35], The number of cells that were tracked in the original study was around 15 cells. 
C: Best fit of the cell cycle model with delayed exponential completion time distribution densities to synthetic data generated from a model with 
hypoexponential completion time distribution densities for the Gi and S phase with parameters as in A and B. D: Recovery of the initial distribution 
densities (solid lines) using the delayed exponential model (dashed line). Both the average and the variability in the S phase completion time 
distribution (original average: 10.70 h, estimated average: 10.88 h; original std: 2.03 h, estimated std: 1.99 h) were estimated accurately. The data 
shown in A-B was read from the graphs in the original publication ([35]). 
doi:1 0.1 371/journal.pcbi.1 00361 6.g006 



estimates (Fig. 5 B). Furtliermore, tlie uncertainties due to noise 
became also significantly smaller (Fig. 5 C). Pooling this artificial 
data according to the output expected from a single pulse 
experiment, reproduced again the uncertainties seen in Fig. 3 C 
(not shown). Together this indicates that the redesigned dual pulse 
protocol provides parameter estimates with higher accuracy and 
precision. Real dual pulse labelling experiments will however be 
needed to confirm these theoretical predictions. 

Robustness of the estimates to other probability 
distributions of the phase completion times and to 
concurrent cell loss 

The cell cycle model introduced here is deliberately simple and 
neglects cell loss. In this section, we ask whether the estimates of its 
parameters are reasonable when some of the simplifying assump- 
tions of the model do not hold. Specifically, we ask how accurate 
are the mean and standard deviation of the phase completion 
times estimated using this simple model if the true completion 
times were not distributed as a delayed exponential function or if 
there was concurrent phase-specific cell loss. 

Empirical measurements [35] indicate that the cycle phase time 
for the S phase is distributed closer to a delayed hypoexponential 
or a delayed gamma distribution (see below) rather than the 
caricatural delayed exponential. Therefore, an important question 
which arises is how much do the estimates of the average and 
standard deviation in phase durations obtained with this simple 
model depend on the true underlying distribution? While many 
different scenarios could be tested we opted to fit a delayed 
hypoexponential density with two decay and one delay parameter 
to direct in vitro measurements of G| and S phase durations 



employing fluorescent biosensors (Fig. 6 A-B, [35]). Using the 
obtained best-fit estimates, we then performed in silico dual-pulse 
labelling experiments, in which the phase durations were drawn in 
the case of the S and Gi phase from delayed hypoexponential 
density functions (Fig. 6 C). Finally we fitted the simple model, i.e., 
Eq. 24 and Eq. 25, which is based on delayed exponential 
distributions, to this data, to see if we could recover the original 
averages and standard deviations despite using the 'wrong' 
caricatural model. Both summary statistics (i.e., mean, standard 
deviation) of phase durations could successfully be re-estimated 
(Fig. 6 D). Although generalizing this finding lies out of the scope 
of this article, it suggests that even if the true underlying 
distribution is not a delayed-exponential function, important 
quantities like the average and standard deviation of the phase 
durations may still be estimated with the simple model developed 
herein. It also indicates that BrdU labelling experiments with a 
realistic number of samples are unlikely to have the power to 
discriminate between delayed exponential and more complex 
density distributions. 

We now turn to the issue of how much the presence of phase- 
specific cell death (or loss in general), which is unaccounted for in 
our model, affects the accuracy of the estimates of the mean and 
standard deviation of the phase durations. To this end, we will first 
introduce the extensions necessary to describe cell death in the 
model. We rely on the fact that if the probability of death per cell 
cycle is less that 50%, the average population size wUl 
asymptotically grow exponentially with an effective growth rate 
V, where 0 < v < ^. This implies that the arguments used to analyze 
exponential growth without death remain valid for a model that 
allows moderate levels of cell death. 
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To consider death, we assume that cells have two possible fates 
per phase, either they progress to the next phase or they die. Let 
fxii), as before, be the phase completion time density, conditioned 
however on the cell being alive at time t. And let fi,(i) be the 
phase-specific time to death density conditioned on the cell having 
not progressed to the next phase. Then, as e.g., in [36], assuming 
that both events compete with each other (i.e., whatever fate 
happens first, prevents the other), the resulting density f(x,S){f) 
becomes 



f(rA{t)=f,Ri{i)+fiRM- 



(28) 



Consider now a scenario of an ("xponcntially growing popula- 
tion, in which cell death occurs exclusively during phase o. Let us 
assume further that the o— phase specific time to death density 
fs„{t) is a simple exponential density with mean p. Using straight- 
forward probabilistic arguments, we can compute analytically, for 
this simple scenario two important quantities, namely the 
probability to die in this phase {ps^), and the expected value of 
the effective completion time, distributed as_/(T,3)(?)- We get 

Pio= (\(x)i?.„(x)c/x= 1 - (-^\e-MP^ E[/-(,,,)(0] =m„. 
Jo \P + ao/ 



Note that, in this simple case, ps^ is also the probability to die 
per division cycle. 

Evaluating Eq. 18 using instead of/^^, we obtain for Eq. 

24, 



\+e I'o 



l—ePo- 



V 



(29) 



t > 



where /x^ and «^ represent the equivalents of p. and n", we had 
previously defined for the case of no cell loss. The former 
quantities, which now depend on p^, are derived applying to Eq. 5 
the same substitution as above. Expressions equivalent to Eq. 10 
and Eq. 1 1 are obtained along the same lines. These become 
however rather lengthy and are therefore omitted here. Eq. 29 
reproduces accurately /g" in simulated BrdU pulse labelling 
experiments, if death occurs, as specified above (see Fig. 7 A for 
an example with o = S and pas£{0.0,0.3}). The differences 
between the analytical predictions for /g" with 30% death and 
without death (denoted by A/|") are, for the parameter sets that we 
tested, relatively small, and vanish as expected, as p^^ tends to zero 
(see Fig. 7 B for A/g" computed at one specific time point (/ = 4 h) 
for different values of pi^). 

To further test, how much both cell death and a completion 
time with a shape distinct from a delayed exponential may jointly 
affect parameter estimates, we simulated BrdU pulse labelling 
experiments, where two major assumptions underlying Eq. 24 



were simultaneously violated. First, we assumed a delayed gamma 
distribution (with shape parameter of two) for the completion time 
of each phase. Second, we considered cell death during S phase, 
and adjusted Ps> such that pi^ was either zero or 0.3. The 
population size (starting with five cells) took about twice as much 
time to grow to a similar size for /;^j=0.3 compared to a the 
scenario without death (see Fig. 7 C, middle column, for five 
independent simulations). In addition, the variability in the 
population sizes between the simulations appeared higher for 
increased death rates. In contrast, when estimating the mean and 
variance of /^^ by non-linear least squares fitting using Eq. 24, the 
marked changes seen in the population kinetics where not 
paralleled by changes in the estimates. Both the mean and the 
variance were accurately determined in both cases (see Fig. 7 C, 
right column). Taken together, this suggests, that the estimates for 
the mean and variance of /t, using Eq. 24, at least for the 
reasonable parameter values that we tested, are relatively robust to 
simultaneous changes in the shape of the completion time, and 
moderate levels of cell death. 

Discussion 

In this article, we propose a simple stochastic model that aims at 
approximating the time it takes for a cell to accomplish the 
sequential phases of the cell cycle, by defining the completion time 
in each phase as a delayed exponential density distribution. At first 
sight this might seem a gross oversimplification of all the processes 
involved. However, when compared with experimental data, this 
simplistic model performs surprisingly well. 

While the observation that the model reproduces closely the 
experimental time series has to be interpreted with care, we think 
its success can be explained by the fact that the probability rule 
captures simultaneously two important regimes of complex 
biochemical processes that qualitatively differ in their completion 
time distribution. As was shown recently by Bel el al. [37] the 
completion time for a large class of complex theoretical 
biochemical systems, including models for DNA synthesis and 
repair, protein translation and molecular transport, simplify either 
to deterministic or to exponentially distributed completion times, 
with a very narrow transition between the two regimes depending 
on the rate parameters. These are precisely the 'ingredients' of the 
delayed exponential distribution. Under this light our model could 
be naively interpreted as a sensor that measures approximately the 
relative contribution of delay and decay processes in each of the 
cell cycle phases. However, whereas delays connected in series 
form again a delay, this is not true for decays. Sequentially coupled 
decays form a process with hypoexponential distributed comple- 
tion times with a shape similar to the frequency distribution of cell 
cycle phase completion time reported in [35] . Thus a more flexible 
model for the completion time of each phase could be a 
hypoexponential distribution of the family that we are currentiy 
using to model the total cell cycle length distribution (i.e., Ecj. 3). If 
instead, processes are not connected in an ordered series but 
rather concurrent, the times for sdl the processes to complete is 
dominated by the largest delay or the smallest decay parameter. 

It is tempting to interpret the relative weight of constant delay 
and exponential decay (i.e., the coefficient of variation) as a 
measure of the precision of the processes regulating each phase, 
which in turn might reflect a selective pressure on timing. Tighter 
pressure might reduce the coefficient of variation, as our results 
suggest for the S phase when compared to the remaining phases. 
Yet, this might also reflect the conjunction of many parallel and 
independent process such as replication forks whose number is 
expected to increase the timing precision by the law of large 
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Figure 7. Effect of cell death and completion time distribution on parameter estimates. A: Comparison of analytical predictions (lines, Eq. 
29) with simulated BrdU labelling experiment (squares). Cell death is assumed to occur exclusively during S phase with probability 0 (red) and 0.3 
(blue) respectively. Only the /g" population is considered. Parameters: a = {2.5, 2.0, 1.5},^ = {1.0, 5.0, 0.5}, units are hours. B: Difference between 
Eq. 29 (accounting for cell death) and Eq. 24 (neglecting cell death) at time / = 4 h (see dashed line in A), as a function of ps^. C: BrdU labelling 
experiments were simulated assuming gamma distributed phase completion times (red curve, graphs on left column) and cell death during S phase 
with probability ps^ =0 and ps^ =0.3 (green curve, graphs on left column). The effective completion time J\,^j) (gray density plot, left column), the 
population growth (middle column) and the estimation of the mean and the standard deviation of /j are shown for both cases. Approximate 
confidence intervals for the estimates are computed as 1.96 times the standard error. Even though and the population growth are strongly 
influenced by the value of ps,., both /I' and the estimates extracted from /g" are barely affected. The dashed lines in the middle column indicate the 
time of the first pulse, which was chosen such that the average population was similar in both scenarios. Parameters for gamma distributed 
completion time distribution of the three phases: shape: A:= {2.0, 2.0, 2.0}, scale: 6* = {1.25, 1.0, 0.75}, delay: ^= {1.0, 5.0, 0.5}. 
doi:l 0.1 371 /journal.pcbi.l 00361 6.g007 



numbers. In fact, the mean time and the variance of the S phase 
are shorter in the early phase of the embryo when cells display a 
higher number of replication forks in which the DNA polymerase 
progresses at the same rate [38]. 

An important simplification of our model consists in the 
assumption that cell loss by death, differentiation or immigration 
is small compared to population wide division rates, such that we 
can neglect it when fitting the model to experimental data. The 
main reason to adopt this approach was simplicity and the fact 
that the available data sets did hardly permit the determination of 
the possibly large number of additional parameters. While for the 
U87 LIFE/DEAD discrimination was performed, the markers 
used for gating are specific for late stages of apoptosis or necrosis 
typically after membrane integrity is lost and therefore do not 
necessarily reflect the true fraction of dying cells. The fraction of 
dead cells identified and excluded by this method was typically 
low. In case that experimental conditions would however suggest 
substantial cell loss, the model is flexible enough to be adapted 
without major technical difficulties, along the lines of Eq. 28 and 
Eq. 29. For instance, when the number of new-born cells equals 
the number of dying cells, solving the model analytically turns out 
to be easier, because ^ = 0. And given that the apoptotic state (e.g., 
defined by Annexin-V staining) would be measured simultaneously 
with nucleoside incorporation and DNA content, this could open 
up the possibility to assess the duration of apoptosis in vivo. These 



potential extensions not withstanding, it is reassuring that 
considering concurrent phase-specific cell death of up 30% may 
not change the estimates of the mean and standard deviation of 
the phase completion time obtained using a caricatural model that 
neglects cell death, as our results indicate. 

Another fundamental abstraction of our model is that the 
completion times for the cell cycle phases of a given cell are 
uncorrelated, which also implies uncorrelated division times of 
parental cells and siblings. Even though positive correlation in 
division times between parental and daughter cells [10] and 
between siblings [36] has been observed recently in vitro by direct 
long-term microscopy of activated proliferating B cells, Schultze et 
al. reported many years ago for in vivo murine crypt epitheUal 
cells the lack of correlation of completion times of a cell through 
successive phases [3 1] . It remains to be shown experimentally how 
much of the correlation or lack of correlation is due to cell type or 
environment. In any case, it would be interesting to extend the 
present model to include correlation in phase completion times. 

The live cell biosensor-based fluorescent imaging strategy 
exploited in [35] allows for direct quantification of the stochastic 
timing of the cell cycle phases. It is worth comparing the estimates 
of cell cycle phase-specific completion times obtained with this 
direct method with those provided by the indirect pulse labelling 
method. The mean S phase completion time was reported for the 
lines NCI-H292 and HeLa ceU line to be 8.2 and 8.4 hours 
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respectively with standard deviation of 0.5 and 2.9 hours 
(extracted from Fig.2 in [35]), which lie in the range of the 
estimates we obtained, despite the different human cell lines that 
have been analyzed (Table 1). In principle, pulse labelling with 
nucleoside analogs can be used in vivo to quantify the stochasticity 
of the cell cycle in anatomical places that are curr(;ntly not fi-asible 
to visualize by multiphoton microscopy, given that a sufficiendy 
large (over 1000) and representative sample of cells can be 
harvested. Our method therefore provides, concerning the Gi, S 
and G2M phases, very similar information as these imaging 
methods, yet it has a much wider application scope. 

In comparison with the Smith and Martin cell cycle model, that 
assumes a single variable phase [5], we have proposed a more 
complex model with three variable phases. A question can be 
raised whether a less complex model with variability in only one or 
two of the three phases would reproduce erjually well our BrdU 
pulse labelling data. This could simplify the analysis and reduce 
the issue of parameter identification. One might, for example, 
consider a scenario, similar to the double transition probability 
model analyzed in [39], in which the Gi and the G2 phase have 
delayed exponentially distributed durations, while the durations of 
S and M phase are frxed. It is easy to see that such a less complex 
model is embedded into our model, as it suffices to set (Xs = 0, 
while assuming that the variability in the duration of the G2M 
phase is generated entirely during the G2 phase. Clearly, from a 
data fitting perspective, and especially for the V79 data set, the 
simpler embedded model and the larger model would perform 
equally well. This can be read directly from Fig. 3, as the set of 
approximate ML estimates for as includes values that are equal or 
close to zero. However, the interpretation of the V79 data set 
based on these two models would be fundamentally different. For 
instance, by relying on the deterministic model, one would be lead 
to conclude that the S phase duration is for every cell about 
9 hours. By allowing however for ots > 0, possible interpretations 
of the data encompass the latter case, but in addition include 
scenarios in which some cells complete their S phase in about 
7 hours, while other cells may take far longer. Even though the 
original data does not permit to discriminate between these 
models, simulated dual pulse labelling experiments indicate that 
this is in principle feasible. Finally, in view of the experimental 
data provided by Hahn et al. ([35], used in Fig. 6 B), the scenario 
of variable S-phase duration with as > 0 is well justified. 

On the other hand, we distinguished only three cell cycle 
phases, although the cell cycle is typically structured into at least 
four biologically distinct phases. This simplification stems from the 
fact that quantification of DNA content by flow cytometry' cannot 
discriminate between cells in the G2 and M phase. Additional 
biomarkers, such as pS780 reported by Jaccoberger et al [40], 
could be used together with DNA content dyes and nucleoside 
analogs in extended labelling protocols to identify the four main 
cell cycle phases. Extending the model to distinguish accordingly a 
fourth phase would be rather straightforward mathemati[:ally. 
Despite restricting the model to three phases, it is worth noticing 
that we are extending the work of Cain and Chau [39,41], who 
studied both balanced and non-balanced growth conditions, 
assuming one and two random transitions, mapped respectively 
to part of Gi and the remaining cell cycle phases. Also, we extend 
the work of Larsson et al. [42] who were able to infer the variation 
in the completion times of S and G2 based on the histograms of 
DNA content. 

Long-term labelling with BrdU has been used in vivo to study 
disease progression of infected rhesus macaques with the simian 
immunodeficiency virus [1,43] and due to toxicity more rarely in 
HIV-1 [44]. These studies typically targeted turnover rates of T 



lymphocytes subpopulation over a time period of several weeks 
and provided average birth and death rate estimates. In contrast, 
the method oudined here measures cell proliferation at a much 
short timescale (12 — 24 hours) and has the potential to yield 
phase specific estimates of both the average and the variability of 
completion times. We anticipate that valuable complementary 
information about SIV and HIV infection could be gained using 
the redesigned protocol proposed here, especially in the light of the 
known modulation of the cell cycle checkpoints by accessory viral 
proteins [45]. 

Recently, in a computational 'tour de force', Falcetta et al. 
[25] used a stochastic model of cell cycle progression with 
discrete age-structure to derive qualitative conclusions about the 
mechanism of action of several anti-cancer therapies. This 
model was able to mimic (in their wording 'rendering') 
quantitative data on single BrdU pulse labelling assay and 
time-lapse imaging. The empirical distribution cell cycle lengths 
they reported is akin to the hypoexponential family in our 
model, however, the distributions of phase lengths remain 
implicit in their simulation framework, in which time is discrete 
and the parameters are transition probabilities per time step. 
This prevents knowing how uncertain are the estimates of the 
phase length variances based on single pulse labelling using their 
approach. 

Dual pulse labelling with a pair of thymidine analogs has been 
used before to study cell cycle kinetics [19,28,31]. What is 
common to those studies is scheduling the two consecutive pulses 
by fixing the time lapse between the pulses, irrespectively of the 
time at which cell samples are collected for cell cycle phase 
analysis. It is worth stressing that, according to the present study, 
specially when the second pulse is timed according to each 
individual sample (i.e. adjusting accordingly the interval between 
pulses) one can harness the potential of the model to quantify the 
mean and variance of the phase-specific time. Alaking the second 
pulse at a fixed minimal time befor(; colkxting cells for analysis 
allows to resolve ceUular cohorts, which would otherwise be 
confounded. 

New technologies like the one developed by Hahn et al. [35] but 
also the ubiquitination-based cell cycle indicator, termed 'Fucci' 
[46] will greatly increase our understanding of phase resolved cell 
cycle progression and unveil its epigenetic and stochastic 
variability in isogenic cell populations. To translate this knowledge 
gained mainly from in vitro cell cultures into an in vivo context, 
long term (greater than 12 hours) and continuous multi-photon 
imaging may be required. This however is technically very 
demanding, and may remain prohibitive for cells deep inside 
tissues despite major technological advances in the field. The 
methodology presented here allows to measure phase specific cell 
cycle progression variability in vivo by relatively simple technical 
means. Even though nucleoside analogs are potentially carcino- 
genic, the adverse effects of low dose pulse labelling remain 
typically undetectable. Determining accurately cell cycle progres- 
sion variability in mouse models of cancer might become a crucial 
step in understanding the high variability in susceptibility to cell 
cycle specific anti-cancer drugs. 

Materials and Methods 

Stability analysis 

Here we wiU show that a cell population that follows the 
stochastic model specified before wiU eventually enter a stationary 

exponential growth phase. The requirement for such an asymp- 
totic behavior is, recalling Eq. 11, that the complex valued 
function 
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.ft/') 



(30) 



has for positive valued elements of the vectors a and P a unique 
positive real root which represents the upper bound of the real part 
of any of its other potentially infinite number of roots. The 
complex numbers ji that solve Eq. 30 correspond, according to our 
model, to the stationary phase growth rate of the proliferating cell 
population. In case that f-i is real, the population is growing 
exponentially, while if /.i is purely imaginary growth is oscillating. 
In general, roots have both non-zero real and imaginary parts, 
which leads to oscillations with growing or decaying amplitude. If 
for real x and y we write /i = x + iy the real and imaginary part of 
Qip.) are computed as 

Re(0 = 2 - e^''(i/'i cos (By) - 1/^2 sin (By)), 



lm(Q)=-e''^^2C0s(By) + <Pi sm(By)), (31) 



where 



^, = U(\ 



+ a,-x:)-r 



-3x 



n 
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(32) 
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Figure 8. Stability analysis. Q{x + iy} as a function of r>0 for fixed 
values of .y>0. For v = 0 (green circle) the real part of Q takes, 
depending on .TefO.oc], a value in the interval [1, — x>]. The values for x 
are increasing from A-D, while a and p remain unchanged. For relatively 
low values of .t (A-B) the real part Re(0 is positive for r = 0. After one 
or several turns, i.e by increasing y the spiral can potentially cross the 
origin only once (empty circle). In A the spiral misses the origin, while in 
B the spiral crosses the origin after one turn. Crossing of the origin 
means that the corresponding complex number n = x + iy is a root of Q. 
In C the spiral starts at the origin. This represents the only real positive 
root of Q. For initially negative values of Re(0 (D) the spiral can never 
cross the origin because the distance to the center point (gray circle) is 
already in the beginning for j = 0 larger than the distance between the 
latter and the origin. By increasing y this distance will even grow further 
according to Eq. 33. 
doi:1 0.1 371/journal.pcbi.1 00361 6.g008 



For to be a root of Q, both real and imaginary part have to 
vanish.We restrict our analysis to the positive complex half 
plane, i.e. x>0, since we are interested in growing and 
not contracting cell populations. Due to the symmetries 
in the trigonometric functions sin (— 3^) = — sin (j) and 
cos(-y) = cos(y) and i/'i(->') = i/'i(j') and i/(2(->')= -lAjCv) 
one can easily see that if ii = x + iy is a root, its complement 
fi*=x — iy is also a root. We can thus reduce the analysis even 
further to values with positive imaginary parts. If for fixed x we 
plot Q in the complex plane as a parametric function of >'e[0,oo] 
we get a spiral with the distance from a center point c = 2 + iO 
given by 

r=^J{Rt(Q)-2)^ + lm(Q)\ 



= e^yn((l + a,-x)2 + a2;'2). (33) 

Crucially, as r is a monotone increasing function of the spiral 
never crosses itself For y = 0 the imaginary part of Q vanishes as 
expected because lim,._,o sin (wy) = 0 and limj._>o i/'2 = 0. For this 



special case Re(0 = 2— TT, (e'^'''-|-(Xi/ieft'') is obviously mono- 
tone decreasing with x and restricted to the interval [1 , — co] . This 
means that the spiral can only 'start' in the interval between one 
and minus infinity. Taken together, this implies that if for y = 0 
and fixed x, the real part of Q is positive, then there exist a single 
'opportunity' to cross the origin, while if negative there exists none. 
At the border where the real part is zero (Fig. 8 C), the 
corresponding value of x is the only positive real root. Due to the 
monotonicity of Re(2) any value of x greater than the positive 
real root will result for _v = 0 in Re(2) < 0 which does not admit for 
any solution. The different possible scenarios are exemplified in 
Fig. 8. 

Experimental methods 

Cell culture. Human astrocytoma cells U-87 MG (ATCC- 
LGC) were routinely cultured with Dulbecco's modified Eagles 
medium (DMEM, Biochrom AG) supplemented with non- 
essential amino acids (NEAA, Invitrogen GmbH), heat-inactivated 
fetal bovine serum (FBS, 10%, Biochrom AG) and additives 
(peniciUin-streptomycin-glutamine, Invitrogen GmbH) in plastic 
flasks (TPP AG) at 37°C in 5% C02-humified incubators and were 
passaged twice a week using Dulbecco's PBS (DPBS, Apotheke 
Innenstadt Uni Mnchen) and Trypsin/EDTA (Biochrom AG) 
before reaching confluence. 
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Treatment with BrdU. For cell cycle analysis cells 
(2.0x10* cm~^) were seeded in 75 cm^ culture flasks and 
incubated for 24 h followed by the BrdU pulse. For this 

purpose, medium was replaced by medium supplemented with 
BrdU (10 ;i(M, Bromodeoxyuridine, Becton Dickinson GmbH), 
cells were incubated for 30 min at 37°C followed by washing 
away of BrdU for two times with fresh medium. Cells were 
then again incubated at 37°C for a designated period of time 
(0 h, 2 h, 4 h, 6 h, 8 h, 12 h) to measure proliferation over 
12 h. 

Preparation of samples. Collecting of cells was performed 
by trypsinization using DPBS, Trypsin/EDTA and medium 
followed by washing of cells in DPBS. To exclude dead cells from 
the analysis staining of dead cells was performed. For this purpose 
cells were incubated for 30 min with fluorescent dye (LIVE/ 
DEAD Fixable Green Dead Cell Stain Kit, Invitrogen) according 
to the manufacturers instructions followed by washing with DPBS. 
Consequent steps of sample preparation were processed using the 
APC BrdU Flow Kit (Becton Dickinson GmbH). CeUs were 
washed once with Perm/Wash Bufier and fixed for 30 min on ice 
with Cytofix/Cytoperm Buffer. After washing with BD Perm/ 
Wash Buffer cells were resuspended in Cytoperm Plus Buffer and 
incubated on ice for 1 0 min foUowed by washing with Perm/Wash 
Buffer and incubation in Cytofix/Cytoperm Buffer for 5 min on 
ice. Cells were then washed with Perm/Wash Buffer and 
incubated with 2M HCl-Triton (1%) for 30 min at room 
temperature followed by washing twice with Perm/Wash Buffer. 
For detection of incorporated BrdU cells were incubated with 
diluted (1:50) fluorochrome-conjugated anti-BrdU antibody for 
20 min at room temperature. Cells were then washed with BD 
Perm/Wash Buffer and further incubated with DAPI (0.5 jUg/ml 
in staining buffer: 100 mM Tris, pH 7.4, 150 mM NaCl, 1 mM 
CaCla 0.5 mM MgClj 0.1% Nonidet P-40) for 30 min at room 
temperature. All samples have subsequently been stored on ice 
until acquisition. 

Acquisition and analysis. Acquisition of data was per- 
formed by measuring fluorescence intensity using a BD LSR II 
Cytometer at the excitation wavelength of 660 nm for APC and 
450 nm for DAPI and the software BD FACSDiva. 

Computational methods 

Modeling and simulations. Anti-derivatives, equations as 
well as eigenvalue problems were solved with the help of 
Mathematica. Stochastic simulations, Markov chain Monte Carlo 

and optimization algorithms (e.g. least square fitting and Newton- 
Raphson root finding) were implemented in C-H-. To fit the 
paramete rs of the model to the data we relied on the population 
based covariance matrix adaptation evolution strategy provided by 
die C-l-H library SHARK [47]. 

In silica data. In order to anticipate and compare the 
information content in data sets that could potentially be acquired 
according to the dual-pulse protocol, in silico data was generated. 
The simulated data consisted of frequencies computed according to 
our model using ML parameter estimates. Noise was added to the 
frequencies by simulating a sampling process with replacement with 
frequencies given by the model and a population size of 300 and 600 
for the U87 and the V79 data set respectively. This reproduced 
approximately the variability observed in the original data sets. To 
make comparison with available data reasonable support points 
were taken to be the same as in the respective data set. 

Bayesian inference. When estimating, by FACS analysis, 
frequencies of cells in different phases of the cell cycle, 
measurement noise becomes unavoidable. Potential sources of 
noise include variability in experimental conditions, gating errors. 



stochasticity in cell division, FACS measurement errors, and many 
more. Here we describe an attempt to account, in a simple way, 
for the observed experimental noise by taking a Bayesian 
approach. This provides us not only with maximum likelihood 
estimate regions of the model parameter, but in addition will give 
us an idea about the uncertainty that we have about the parameter 
values. 

Even though considering all potential sources of noise would be 
most consistent, the resulting probability model would become far 
more complex than our initial cell cycle model. To avoid this 
overload we assume that a relatively simple ad hoc multivariate 
probability density function approximates reasonably well the 
average and the noise in the observed frequencies at a single time 
point. This probability density function, which corresponds to the 
likelihood Vi of a single measurement event is defined by 

V.(,hi\a,p,N; td = r(iV) U ^ , (34) 

where T is the Euler gamma function. The right-hand side of 
Eq. 34 corresponds to a continuous approximation of a scaled 
multinomial distribution with support Xy6[0,l] and '^Xj=l [48]. 
The parameter N, which determines the spread of the distribution, 
can be interpreted as an effective population size. Tsiking e.g., a 
sample of size N from a population of cells containing k sub- 
populations with proportions given by w, yields frequencies with a 
probabiht)' density approximately distributed accordingly. If N is 
small the density distribution is broad, while if N becomes large 
the density distribution becomes narrow. 

Following in general terms the notation in the main text, the hij 
denote the k measured population frequencies from experiment / 
and the «y stand for the corresponding frequencies predicted by 
the cell cycle model. The latter obviously depend on the parameter 
vector a and fi and the time f,. 

Having defined the likelihood Vi for an outcome of a single 
pulse labelling experiment, the likelihood for the outcomes of a set 

of m experiments is the product V = n: ^ I Vi under the 
reasonable assumption that noise in a specific experiment is 
independent of aU the other experiments. By numerically inverting 
V, using Bayes theorem, one can obtain the posterior and 
subsequently the uncertainty o\'er the model parameter given the 
data, the model and prior knowledge. 

To estimate the maximum likelihood regions, the posteriors and 
the uncertainties in the a and yff for the U87 and V79 data sets, we 
implemented in C++ the adaptive Markov-Chain-Monte-Carlo 
algorithm proposed in [49]. The estimates for the maximum 
likelihood regions are obtained by fixing iV to a very large value 
(e.g., 10'). For Bayesian inference, N was considered as an 
additional parameter. For simplicity, improper priors uniformly 
distributed over the positive real number were assumed for all 
parameter. The first 10^ steps of the initially lO' step-long chains 
were discarded, and of the remaining chains every lOOO'th step 
was included in the subsequent analysis. The credibility regions 
were computed from the resulting MCMC chains using the 
'HPDregionplot' routine in the R package 'emdbook', and 
convergence of the chains were confirmed using the Gelman 
convergence test. 
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